Leveraging Gen AI

Ethiopian Community Gathering

Eyayaw Beze

July 13, 2025

Intro

  • ChatGPT (Nov 2022)

    • Hallucinate, training cutoff
    • Expensive and fewer options
  • State-of-the-Art (SOTA) Large Language Models (LLMs) are powerful and feature-rich

    • Hallucinate less, realtime info, better instruction following
    • Good at coding and complex problem-solving
    • Multi-modal (text, images, audio, video)
    • Tool use and reasoning, and agentic capabilities

Major Providers

  • OpenAI1 — GPT Models (4.1, o3)
  • Anthropic — Claude Models (Sonnet/Opus 4)
  • Google — Gemini Models (2.5 Pro), Gemma
  • xAI — Grok Models (4)
  • Perplexity — Sonar
  • Meta — Llama Models (4)
  • Microsoft — Phi
  • Mistral — Mistral
  • Alibaba — Qwen
  • Deepseek2

Unlocking the power of LLMs

  • Rich builtin features

    • Chat, summarization, writing, translation, search, coding
    • Canvas/Artifacts, projects, deep research, data analysis
    • Agents, MCP, CLIs (Vibe coding)
  • Customizing llms (“context”: adding knowledge)

    • System instruction, RAG, tool use, MCP

    \rightarrow Endless use cases (tutor, career coach, …)

  • My use cases

    • Improving writing, and fixing grammar mistakes/typos

      Example

      here are my questions 1,How could I wine the scholarship regarding your experience 2, What are the easy ways to get scholarships 3,If you have Templetes of documention prepartion to prepare my owen 4,If you Know sites to Apply scholarships 5,If you have any options which you advise me to get this chance.Thanks!!!

    • Write a cover letter:

      Prompt

      You'll primarily act as a career coach, helping me write tailored cover letters based on the context provided in this project: CV, motivation statement, and dissertation summary. I'll provide the job description for each position. Please sound less robotic and avoid clichés, usual, flashy words and phrases. You do this task when I use the flag @cl.
      
      Occasionally, you'll be asked to answer application form questions such as "Why this role?" In such cases, use the information provided in the project context to formulate appropriate responses. The flag for this task is @answer.
      
      If you notice that I’m not eligible or a good fit for the position for any reason, please state it upfront.
      
      PROVIDE ALL YOUR COMMENTARY AT THE BEGINNING. NO COMMENTARY SHOULD BE ADDED AFTER THE COVER LETTER.
      
      <LINKS>
      1. Project 1: https://github.com/eyayaw/housing-supply-elasticity-in-germany
      2. Project 2: https://github.com/eyayaw/de-donut-effect
      3. Project 3: https://github.com/eyayaw/the-monocentric-city-gradients-addis-ababa
      4. Dissertation: https://github.com/eyayaw/dissertation
      </LINKS>
      
      <STYLE_GUIDE>
      1. Use American English
      2. DON'T USE punctuation inside quotations like <"Professionalism," and ...>, rather use <"Professionalism", and ...>.
      3. MINIMIZE the use of bullet points and em dashes
      </STYLE_GUIDE>
      
      Please embed links, for research projects and the dissertation repo, in markdown format. If asked explicitly to output Typst, use this format #link("https://github.com/eyayaw/housing-supply-elasticity-in-germany")[housing supply elasticity].

Deep Research

  • An agent that does a multi-step research for complex tasks.

Research Assistant

Browses the web and analyzes various sources on your behalf, assist you with your in-depth research.

  • Takes a prompt \rightarrow drafts a research plan
  • Finds relevant resources (reasons + iteratively)
  • Provides a comprehensive report
Resources

Question

Have you called llms outside of their chat interfaces?

  • OCR, structured data extraction
  • Copilot + agentic-coding (in IDEs), Vibe coding

Data Extraction

  • Document understanding: Digitize text from docs (images/scans, videos, PDFs etc.)
Prompt Extract the data to a valid CSV format that is safe for parsing. Output to a tidy (long)-format with columns: country, area, pop, hospitals, health_centers, health_stations, beds
Category, Ethiopia, USA
Sq. Miles, 489239, 3608787
Population, 25008631, 205000000
Hospitals, 88 hospitals with 9449 beds,
Health Centers, 81,
Health Stations, 542 (bush clinics),
R
```{r}
data = read.csv("demo/data/mettu_1972.csv")
data_long = read.csv("demo/data/mettu_1972_long.csv")
```
Category Ethiopia USA
Sq. Miles 489239 3608787
Population 25008631 205000000
Hospitals 88 hospitals with 9449 beds NA
Health Centers 81 NA
Health Stations 542 (bush clinics) NA
country area pop hospitals health_centers health_stations beds
Ethiopia 489239 25008631 88 81 542 9449
USA 3608787 205000000 NA NA NA NA
  • Unstructured data \rightarrow Structured output

    Gemini API: Provide a schema (responseSchema), or in a text prompt

real_estate_ad
"""ሁል የተሟላለት ቅንጡ የመኖሪያ ቤት ኪራይ ሲምሲ ኮፓውንድ ውስጥ ካሬ ሜትር 600
መኝታ ቤት 6 ሳሎን 2 ኪችን2 ሻውር ሽንት ቤት 5 በቂ የመናፈሻ ስፍራ መኪና ማቆሚያ 6
ዋጋ140 ሽ ኮሚሽን 10% 0911067686x"""
<json_output>
{
         "property_type": "house",
         "property_use": "residential",
         "listing_type": "rent",
         "price": {
             "value": 140000.0,
             "currency": "ETB",
             "unit": "per month"
         },
         "address": {
             "original": "ሲኤምሲ ኮምፓውንድ",
             "transliterated": "CMC Compound"
         },
         "size": {
             "plot_area": 600.0,
             "unit": "sqm"
         },
         "property_condition": "excellent",
         "bedrooms": 6,
         "bathrooms": 5,
         "furnishing_status": "fully-furnished",
         "structural_features": {
             "has_parking": true,
             "parking_spaces": 6,
             "has_garden": true
         },
         "seller": {
             "type": "broker",
             "contact": {
                 "phone": "0911067686"
             },
             "commission": 10.0
         },
         "description": "Fully equipped luxury residential house for rent located inside CMC compound. The property has 6 bedrooms, 5
 bathrooms, 2 living rooms, and 2 kitchens. It includes ample garden space and parking for 6 vehicles.",
         "remarks": "The advertisement explicitly mentions 2 living rooms and 2 kitchens, which are notable features."
     }

Advanced usage

  1. AI Playgrounds:

  2. Code CLIs (programming)

Discussion



Ideas
  1. What are your use cases? What role does it play in your field?
  2. Do you think AI will take our jobs?
  3. What role does AI play in economic development?

Appendix

LLM Benchmarks and Leaderboards

Benchmarks

  • Humanity’s Last Exam, a multi-modal benchmark at the frontier of human knowledge, 2,500 challenging questions across over 100 subjects. (Dataset)

  • MMLU – Measuring Massive Multitask Language Understanding (Dataset)

    57 subjects (STEM, law, etc.), 16k multiple-choice questions, checks breadth of factual knowledge and multiple-choice reasoning accuracy.

  • BIG-Bench (+ variants): 204 tasks, diverse topics (including tasks which quantify social bias in language models) (Repo, BBH), task contribution from the community

  • ARC Challenge “ARC-AGI is the only AI benchmark that tests for general intelligence by testing not just for skill, but for skill acquisition.”

  • LongBench v2: Benchmarking deeper understanding and reasoning on realistic long-context multitasks. (See also Needle-in-a-Haystack)

    Contexts up to 2 M words across QA, summarisation, code-repo understanding; probes long-context recall, retrieval and deep reasoning.

Leaderboards

  • Artificial Analysis: Comparison of over 100 AI models
  • LMArena: based on 3.5M blind user votes \rightarrow what the general ai community thinks is the “best model”.
    Benchmarks can be reverse-engineered Companies could easily include questions and answers into their models’ training data. (See MMLU’s limitation)
  • ARC Prize

Considerations

  • The problem you are going to solve.
  • Performance, speed, and cost of the model